Search CORE

94 research outputs found

Do we still need supertrees?

Author: A De Queiroz
A Kupczok
Arndt von Haeseler
B Baum
ELL Sonnhammer
J Gatesy
JH Degnan
K Nyakatura
MJ Sanderson
N-PD Nguyen
ORP Bininda-Emonds
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

The up-dated species level phylogeny for the carnivores using a supertree approach provides new insights into the evolutionary origin and relationships of carnivores. While the gain in biological knowledge is substantial, the supertree approach is not undisputed. I discuss the principles of supertree methods and the competitor supermatrix approaches. I argue that both methods are important to infer phylogenetic relationships

Crossref

Springer - Publisher Connector

PubMed Central

New decoding algorithms for Hidden Markov Models using distance measures on labellings

Author: A Krogh
B Brejová
Daniel G Brown
ELL Sonnhammer
GE Tusnady
Jakub Truszkowski
L Käll
L Käll
M Stanke
P Fariselli
R Durbin
SL Cawley
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Existing hidden Markov model decoding algorithms do not focus on approximately identifying the sequence feature boundaries. Results We give a set of algorithms to compute the conditional probability of all labellings "near" a reference labelling <it>λ </it>for a sequence <it>y </it>for a variety of definitions of "near". In addition, we give optimization algorithms to find the best labelling for a sequence in the robust sense of having all of its feature boundaries nearly correct. Natural problems in this domain are <it>NP</it>-hard to optimize. For membrane proteins, our algorithms find the approximate topology of such proteins with comparable success to existing programs, while being substantially more accurate in estimating the positions of transmembrane helix boundaries. Conclusion More robust HMM decoding may allow for better analysis of sequence features, in reasonable runtimes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Algorithm of OMA for large-scale orthology inference

Author: A Alexeyenko
A Bateman
A Schneider
AC Berglund-Sonnhammer
AK Bjorklund
Alexander CJ Roth
AM Altenhoff
AR Mushegian
C Dessimoz
C Dessimoz
C Dessimoz
CEV Storm
Christophe Dessimoz
CM Zmasek
D Fulton
DA Benson
DP Wall
ELL Sonnhammer
Gaston H Gonnet
K Chen
L Jensen
L Li
M Dayhoff
M Farrar
M Gil
M Remm
P Flicek
R Balasubramanian
RA Notebaart
RL Tatusov
RL Tatusov
RTJMvan der Heijden
TF DeLuca
TF Smith
WM Fitch
Publication venue: BioMed Central
Publication date: 01/12/2008
Field of study

Since the publication of our article (Roth, Gonnet, and Dessimoz: BMC Bioinformatics 2008 9: 518), we have noticed several errors, which we correct in the following

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

MSDmotif: exploring protein sites and motifs

Author: A Golovin
A Golovin
A Prilc
A Prlic
Adel Golovin
AG Murzin
AJ Shepherd
AV Efimov
AV Efimov
BL Sibanda
C Bystroff
CA Orengo
CG Hunter
CH Wu
CT Porter
D Schomburg
DCP Kuhn
DI Stuart
DJ Craik
EJ Milner-White
EJ Milner-White
EJ Milner-White
ELL Sonnhammer
ELL Sonnhammer
H Boutselakis
H Kaur
H Kawasaki
HM Berman
ID Kuntz
J Lee
JD Watson
JD Watson
JYL Questel
KB Li
Kim Henrick
M Clamp
MJ Hartshorn
MR Nelson
N Hulo
ND Rawlings
RD Dowell
RD Finn
S Hayward
S Zhirong
SF Altschul
SF Altschul
T Hubbard
TJ Oldfield
TL Bailey
WJ Duddy
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS) protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genomic-Bioinformatic Analysis of Transcripts Enriched in the Third-Stage Larva of the Parasitic Nematode Ascaris suum

Author: A Brand
A Fire
A Krogh
A Sugimoto
AA Aboobaker
AJ Nisbet
AJ Nisbet
AJ Wolstenholme
Alasdair J. Nisbet
Alex Loukas
B Besier
B Gorgoni
B Sonnichsen
BE Campbell
BJ Datu
C Britton
CC Mello
Cinzia Cantacessi
CQ Huang
Cui-Qin Huang
D Kressler
DL Donald
DP Knox
DP Knox
DR Brooks
DW Crompton
ELL Sonnhammer
F Simmer
FW Douvres
FW Douvres
G Gao
H Nielsen
H Nielsen
HA Tissenbaum
HP Fagerholm
HS Yu
J Bethony
J Kass
J Parkinson
JA Powell- Coffman
Jason Mulvenna
JD Bendtsen
JM Foster
JM Moser
JP Boyle
JS Gilleard
KD Murrell
L Lebioda
M Ashburner
M Blaxter
M Jiang
M Mitreva
MA Andrade
Michael Cappello
MK Islam
Ning Chen
P Geldhof
P Horton
PA Cottee
PA Pilitt
Paul W. Sternberg
PR Boag
R Barstead
RL McNeel
Robin B. Gasser
RS Kamath
RS Kamath
Rui-Qing Lin
S Hashmi
S Moller
S Nikolaou
SK Kim
TR Burglin
V Reinke
W Peng
W Peng
W Zhong
WE Pomroy
Weiwei Zhong
Xing-Quan Zhu
YH Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

Differential transcription in Ascaris suum was investigated using a genomic-bioinformatic approach. A cDNA archive enriched for molecules in the infective third-stage larva (L3) of A. suum was constructed by suppressive-subtractive hybridization (SSH), and a subset of cDNAs from 3075 clones subjected to microarray analysis using cDNA probes derived from RNA from different developmental stages of A. suum. The cDNAs (n = 498) shown by microarray analysis to be enriched in the L3 were sequenced and subjected to bioinformatic analyses using a semi-automated pipeline (ESTExplorer). Using gene ontology (GO), 235 of these molecules were assigned to ‘biological process’ (n = 68), ‘cellular component’ (n = 50), or ‘molecular function’ (n = 117). Of the 91 clusters assembled, 56 molecules (61.5%) had homologues/orthologues in the free-living nematodes Caenorhabditis elegans and C. briggsae and/or other organisms, whereas 35 (38.5%) had no significant similarity to any sequences available in current gene databases. Transcripts encoding protein kinases, protein phosphatases (and their precursors), and enolases were abundantly represented in the L3 of A. suum, as were molecules involved in cellular processes, such as ubiquitination and proteasome function, gene transcription, protein–protein interactions, and function. In silico analyses inferred the C. elegans orthologues/homologues (n = 50) to be involved in apoptosis and insulin signaling (2%), ATP synthesis (2%), carbon metabolism (6%), fatty acid biosynthesis (2%), gap junction (2%), glucose metabolism (6%), or porphyrin metabolism (2%), although 34 (68%) of them could not be mapped to a specific metabolic pathway. Small numbers of these 50 molecules were predicted to be secreted (10%), anchored (2%), and/or transmembrane (12%) proteins. Functionally, 17 (34%) of them were predicted to be associated with (non-wild-type) RNAi phenotypes in C. elegans, the majority being embryonic lethality (Emb) (13 types; 58.8%), larval arrest (Lva) (23.5%) and larval lethality (Lvl) (47%). A genetic interaction network was predicted for these 17 C. elegans orthologues, revealing highly significant interactions for nine molecules associated with embryonic and larval development (66.9%), information storage and processing (5.1%), cellular processing and signaling (15.2%), metabolism (6.1%), and unknown function (6.7%). The potential roles of these molecules in development are discussed in relation to the known roles of their homologues/orthologues in C. elegans and some other nematodes. The results of the present study provide a basis for future functional genomic studies to elucidate molecular aspects governing larval developmental processes in A. suum and/or the transition to parasitism

ResearchOnline@JCU

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Caltech Authors

University of Melbourne Institutional Repository

IgTM: An algorithm to predict transmembrane domains and topology in proteins

Author: B Mathews
C Pasquier
D Angluin
D Angluin
D Lopez
D Lopez
Damián López
DB Searls
DT Jones
E Wallin
EE Pashou
ELL Sonnhammer
EM Gold
GE Tusnády
H Viklund
J Berstel
JE Hopcroft
JM Sempere
L Käll
LR Murphy
M Burset
M Ikeda
M Punta
Marcelino Campos
MM Gromiha
NS Sadovskaya
P Fariselli
P García
P Peris
PG Bagos
Piedachu Peris
R B
S Jayasinghe
S Mitaku
S Möller
T Knuutila
T Li
T Yokomori
T Yokomori
Publication venue: BioMed Central
Publication date: 01/09/2008
Field of study

Abstract Background Due to their role of receptors or transporters, membrane proteins play a key role in many important biological functions. In our work we used Grammatical Inference (GI) to localize transmembrane segments. Our GI process is based specifically on the inference of Even Linear Languages. Results We obtained values close to 80% in both specificity and sensitivity. Six datasets have been used for the experiments, considering different encodings for the input sequences. An encoding that includes the topology changes in the sequence (from inside and outside the membrane to it and vice versa) allowed us to obtain the best results. This software is publicly available at: <url>http://www.dsic.upv.es/users/tlcc/bio/bio.html</url> Conclusion We compared our results with other well-known methods, that obtain a slightly better precision. However, this work shows that it is possible to apply Grammatical Inference techniques in an effective way to bioinformatics problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling

Author: A Armon
A Prlic
Alessandra Carbone
BW Matthews
CA Innis
CA Innis
CJ Tsai
CT Porter
DR Caffrey
E Kanamori
ELL Sonnhammer
G Cheng
GH Gonnet
H Chen
I Mihalek
JA Studier
JR Bradford
Ladislas A. Trojan
Michael Levitt
O Lichtarge
O Lichtarge
P Chakrabarti
Richard Lavery
RP Bahadur
S Henikoff
S Jones
S Madabushi
S Miller
SF Altschul
SJ Hubbard
Sophie Sacquin-Mora
SS Negi
Stefan Engelen
T Pupko
W Humphrey
WSJ Valdar
Y Ofran
Y Ofran
ZJ Hu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The Joint Evolutionary Trees (JET) method detects protein interfaces, the core residues involved in the folding process, and residues susceptible to site-directed mutagenesis and relevant to molecular recognition. The approach, based on the Evolutionary Trace (ET) method, introduces a novel way to treat evolutionary information. Families of homologous sequences are analyzed through a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple alignment and impacts of weakly homologous sequences on distance tree construction. The sampling method makes sequence analysis more sensitive to functional and structural importance of individual residues by avoiding effects of the overrepresentation of highly homologous sequences and improves computational efficiency. A carefully designed clustering method is parametrized on the target structure to detect and extend patches on protein surfaces into predicted interaction sites. Clustering takes into account residues' physical-chemical properties as well as conservation. Large-scale application of JET requires the system to be adjustable for different datasets and to guarantee predictions even if the signal is low. Flexibility was achieved by a careful treatment of the number of retrieved sequences, the amino acid distance between sequences, and the selective thresholds for cluster identification. An iterative version of JET (iJET) that guarantees finding the most likely interface residues is proposed as the appropriate tool for large-scale predictions. Tests are carried out on the Huang database of 62 heterodimer, homodimer, and transient complexes and on 265 interfaces belonging to signal transduction proteins, enzymes, inhibitors, antibodies, antigens, and others. A specific set of proteins chosen for their special functional and structural properties illustrate JET behavior on a large variety of interactions covering proteins, ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf, Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant improvement in performance and computational efficiency is shown

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

CLUSS: Clustering of protein sequences based on a new similarity measure

Author: A Krause
Abdellali Kelil
AJ Enright
Alain Fleury
C Notredame
D Higgins
ELL Sonnhammer
F Titgemeyer
G Reinert
G Yona
H Lodish
IV Tetko
J Felsenstein
J Heringa
J Rocha
JD Thompson
JD Thompson
JH Ward
JH Ward
JS Varré
K Katoh
K Sjölander
K Sjölander
M Ike
M Kimura
MO Dayhoff
MY Leung
N Côté
N Wicker
P Pipenbacher
R Jothi
RC Edgar
RC Edgar
RO Duda
Ryszard Brzezinski
S Fanning
S Henikoff
S Karlin
S Karlin
S Karlin
S Vinga
SF Altschul
SF Altschul
Shengrui Wang
T Fukamizo
T Ishimizu
V Batagelj
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "<it>phylogenetic</it>" in the sense of "<it>relatedness of biological functions</it>". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity. Conclusion We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals

Author: A Alexeyenko
A Kuzniar
AI Su
AJ Vilella
AM Schnoes
Andrey Rzhetsky
B Rost
B Sennblad
BE Engelhardt
BY Liao
BY Liao
CB Bridges
CB Bridges
CL McGrath
CM Zmasek
D Lee
DL Des Marais
DM Martin
E Zuckerkandl
ELL Sonnhammer
EV Koonin
G Glazko
G Shi
HJ Muller
JA Eisen
JA Tennessen
K Dolinski
KD Makova
L Huminiecki
M Goodman
M Kimura
M Lynch
Matthew W. Hahn
MV Han
MV Han
MW Hahn
N Goldman
Nathan L. Nehrt
P Katz
P Radivojac
Predrag Radivojac
R Rentzsch
RA Studer
RA Studer
RD Chen
RL Tatusov
RS Datta
S Addou
S Mika
S Ohno
SG Stephens
T Gabaldon
T Gabaldon
T Hawkins
T Hulsen
W-H Li
WH Li
WM Fitch
WM Fitch
Wyatt T. Clark
ZD Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the “ortholog conjecture”). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Osiris: accessible and reproducible phylogenetic and phylogenomic analyses within the Galaxy workflow management system

Author: A Loytynoja
A Stamatakis
AJ Drummond
B Giardine
B Ludascher
B Misof
Celia K C Churchill
CO Webb
D Darriba
D Posada
DG MacArthur
DP Faith
E Afgan
E Afgan
E Lord
ELL Sonnhammer
F Abascal
F Nardi
G Talavera
H Shimodaira
I Ebersberger
I Letunic
J Evans
K Katoh
K Tamura
Karl B Lopker
L Liu
L Liu
L Liu
LS Kubatko
M Abouelhoda
M Sabrina Pankey
Markos A Alexandrou
MV Han
NP Brown
O Sakarya
P Kuck
RA Vos
RC Edgar
RD Finn
Roger Ngo
SA Berger
SA Smith
SV Edwards
T Oinn
TH Oakley
Todd H Oakley
William Chen
WP Maddison
WP Maddison
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref